Data Warehouse
Ressources
What is a Data Warehouse
A data warehouse is a centralized repository that stores structured, semi-structured data from multiple sources in a predefined schema, specifically designed for data analysis, BI, and reporting.
It contains historical data that has been cleaned, transformed, and organized according to a specific business model or requirement.
Data must be processed and transformed (ETL - Extract, Transform, Load) or ELT (Extract, Load, Transform) before being stored in the warehouse.
In general, data warehouses are optimized for Gigabytes to Petabytes of data.

Why use a Data Warehouse
- Optimized for Analysis: Data is structured and organized for optimal query performance and business intelligence.
- Data Consistency: Enforces data quality and provides a single source of truth for business reporting.
- Historical Analysis: Maintains historical data for trend analysis and business intelligence.
- Business Focus: Designed specifically for business users and analysts to access standardized data for reporting and decision-making.
Challenges of Data Warehouses
- Schema Rigidity: Predefined schema makes it difficult to adapt to new data types or changing business requirements.
- Cost: High setup and maintenance costs due to specialized hardware and software requirements.
- Time to Value: Implementing ETL processes and maintaining data models can be time-consuming.
- Limited Data Types: Primarily designed for structured data, making it difficult to handle unstructured or semi-structured data formats.
OLAP Systems
Data warehouses are built on OLAP (Online Analytical Processing) systems, which differ significantly from OLTP (Online Transaction Processing) systems used in operational databases.

Key Differences
| Characteristic | OLTP | OLAP |
|---|---|---|
| Purpose | Daily transactions processing | Data analysis and reporting |
| Data Model | Normalized | Denormalized |
| Data Updates | Continuous updates | Periodic batch updates |
| Query Type | Simple transactions, few records | Complex queries, many records |
| Response Time | Milliseconds | Seconds to minutes |
| Space Requirements | Gigabytes | Terabytes to Petabytes |
| Data History | Current data (days/months) | Historical data (years) |
| Optimization | For insert/update | For complex queries |